Search CORE

Cape Town University OpenUCT

Frequent toggling between alternative amino acids is driven by selection in HIV-1

Author: Delport Wayne
Scheffler Konrad
Seoighe Cathal
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2008
Field of study

Author Summary Viruses, such as HIV, are able to evade host immune responses through escape mutations, yet sometimes they do so at a cost. This cost is the reduction in the ability of the virus to replicate, and thus selective pressure exists for a virus to revert to its original state in the absence of the host immune response that caused the initial escape mutation. This pattern of escape and reversion typically occurs when viruses are transmitted between individuals with different immune responses. We develop a phylogenetic model of immune escape and reversion and provide evidence that it outperforms existing models for the detection of selective pressure associated with host immune responses. Finally, we demonstrate that amino acid toggling is a pervasive process in HIV-1 evolution, such that many of the positions in the virus that evolve rapidly, under the influence of positive Darwinian selection, nonetheless display quite low sequence diversity. This highlights the limitations of HIV-1 evolution, and sites such as these are potentially good targets for HIV-1 vaccines

Benchmarking multi-rate codon models

Author: Delport Wayne
Gravenor Mike B.
Muse Spencer V.
Pond Sergei Kosakovsky
Scheffler Konrad
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 21/07/2010
Field of study

CITATION: Delport, W. et al. 2010. Benchmarking multi-rate codon models. PLoS ONE, 5(7): e11587, doi:10.1371/journal.pone.0011587.The original publication is available at http://journals.plos.org/plosoneThe single rate codon model of non-synonymous substitution is ubiquitous in phylogenetic modeling. Indeed, the use of a non-synonymous to synonymous substitution rate ratio parameter has facilitated the interpretation of selection pressure on genomes. Although the single rate model has achieved wide acceptance, we argue that the assumption of a single rate of non-synonymous substitution is biologically unreasonable, given observed differences in substitution rates evident from empirical amino acid models. Some have attempted to incorporate amino acid substitution biases into models of codon evolution and have shown improved model performance versus the single rate model. Here, we show that the single rate model of non-synonymous substitution is easily outperformed by a model with multiple non-synonymous rate classes, yet in which amino acid substitution pairs are assigned randomly to these classes. We argue that, since the single rate model is so easy to improve upon, new codon models should not be validated entirely on the basis of improved model fit over this model. Rather, we should strive to both improve on the single rate model and to approximate the general time-reversible model of codon substitution, with as few parameters as possible, so as to reduce model over-fitting. We hint at how this can be achieved with a Genetic Algorithm approach in which rate classes are assigned on the basis of sequence information content. © 2010 Delport et al.http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0011587Publisher's versio

Correcting the Bias of Empirical Frequency Parameter Estimators in Codon Models

Author: C Kosiol
C Seoighe
G Schwarz
GC Conant
Konrad Scheffler
M Anisimova
M Lacerda
N Goldman
S Whelan
Sergei Kosakovsky Pond
SL Kosakovsky Pond
SL Kosakovsky Pond
SL Kosakovsky Pond
Spencer V. Muse
SV Muse
Thomas Mailund
W Delport
Wayne Delport
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Markov models of codon substitution are powerful inferential tools for studying biological processes such as natural selection and preferences in amino acid substitution. The equilibrium character distributions of these models are almost always estimated using nucleotide frequencies observed in a sequence alignment, primarily as a matter of historical convention. In this note, we demonstrate that a popular class of such estimators are biased, and that this bias has an adverse effect on goodness of fit and estimates of substitution rates. We propose a “corrected” empirical estimator that begins with observed nucleotide counts, but accounts for the nucleotide composition of stop codons. We show via simulation that the corrected estimates outperform the de facto standard estimates not just by providing better estimates of the frequencies themselves, but also by leading to improved estimation of other parameters in the evolutionary models. On a curated collection of sequence alignments, our estimators show a significant improvement in goodness of fit compared to the approach. Maximum likelihood estimation of the frequency parameters appears to be warranted in many cases, albeit at a greater computational cost. Our results demonstrate that there is little justification, either statistical or computational, for continued use of the -style estimators

CiteSeerX

Experimental evidence indicating that mastreviruses probably did not co-diverge with their hosts

Author: Briddon Rob W
Delport Wayne
Donaldson Lara
Duffy Siobain
Harkins Gordon W
Martin Darren P
Monjane Adérito L
Owor Betty E
Rybicki Edward P
Saumtally Salem
Shepherd Dionne N
Triton Guy
Varsani Arvind
Wood Natasha
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background. Despite the demonstration that geminiviruses, like many other single stranded DNA viruses, are evolving at rates similar to those of RNA viruses, a recent study has suggested that grass-infecting species in the genus Mastrevirus may have co-diverged with their hosts over millions of years. This "co-divergence hypothesis" requires that long-term mastrevirus substitution rates be at least 100,000-fold lower than their basal mutation rates and 10,000-fold lower than their observable short-term substitution rates. The credibility of this hypothesis, therefore, hinges on the testable claim that negative selection during mastrevirus evolution is so potent that it effectively purges 99.999% of all mutations that occur. Results. We have conducted long-term evolution experiments lasting between 6 and 32 years, where we have determined substitution rates of between 2 and 3 × 10 -4substitutions/site/year for the mastreviruses Maize streak virus (MSV) and Sugarcane streak Réunion virus (SSRV). We further show that mutation biases are similar for different geminivirus genera, suggesting that mutational processes that drive high basal mutation rates are conserved across the family. Rather than displaying signs of extremely severe negative selection as implied by the co-divergence hypothesis, our evolution experiments indicate that MSV and SSRV are predominantly evolving under neutral genetic drift. Conclusion. The absence of strong negative selection signals within our evolution experiments and the uniformly high geminivirus substitution rates that we and others have reported suggest that mastreviruses cannot have co-diverged with their hosts. © 2009 Harkins et al; licensee BioMed Central Ltd

Cape Town University OpenUCT

Springer - Publisher Connector

UC Research Repository

Queensland University of Technology ePrints Archive

National Research Foundation

Evolutionary distances in the twilight zone -- a rational kernel approach

Author: A Keller
A Löytynoja
A Stamatakis
B Chor
B Schölkopf
Benjamin Merget
C Cortes
C Daskalakis
CB Do
E Rivas
F Bemm
Florian Markowetz
Frank Förster
G Talavera
HH Otu
I Ulitsky
J Felsenstein
J Friedrich
J Hein
JL Thorne
JL Thorne
Jörg Schultz
KM Wong
LS Wang
M Höhl
M Höhl
M Mohri
M Mohri
M Wolf
MA Buchheim
MA Suchard
Matthias Wolf
MJ Bishop
MK Kuhner
MS Waterman
N Goldman
N Higham
R Durbin
RC Edgar
RF Doolittle
Roland F. Schwarz
S Roch
S Whelan
SR Eddy
T Mailund
T Müller
TH Ogden
V Levenshtein
W Fletcher
W Fletcher
Wayne Delport
William Fletcher
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 23/11/2010
Field of study

Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.Comment: to appear in PLoS ON

arXiv.org e-Print Archive

MDC Repository

CodonTest: Modeling Amino Acid Substitution Preferences in Coding Sequences

Codon models of evolution have facilitated the interpretation of selective forces operating on genomes. These models, however, assume a single rate of non-synonymous substitution irrespective of the nature of amino acids being exchanged. Recent developments have shown that models which allow for amino acid pairs to have independent rates of substitution offer improved fit over single rate models. However, these approaches have been limited by the necessity for large alignments in their estimation. An alternative approach is to assume that substitution rates between amino acid pairs can be subdivided into rate classes, dependent on the information content of the alignment. However, given the combinatorially large number of such models, an efficient model search strategy is needed. Here we develop a Genetic Algorithm (GA) method for the estimation of such models. A GA is used to assign amino acid substitution pairs to a series of rate classes, where is estimated from the alignment. Other parameters of the phylogenetic Markov model, including substitution rates, character frequencies and branch lengths are estimated using standard maximum likelihood optimization procedures. We apply the GA to empirical alignments and show improved model fit over existing models of codon evolution. Our results suggest that current models are poor approximations of protein evolution and thus gene and organism specific multi-rate models that incorporate amino acid substitution biases are preferred. We further anticipate that the clustering of amino acid substitution rates into classes will be biologically informative, such that genes with similar functions exhibit similar clustering, and hence this clustering will be useful for the evolutionary fingerprinting of genes

Cronfa at Swansea University

Long-Branch Attraction Bias and Inconsistency in Bayesian Phylogenetics

Author: A Stamatakis
AR Lemmon
AWF Edwards
B Kolaczkowski
B Kolaczkowski
B Kolaczkowski
B Kolaczkowski
BP Carlin
Bryan Kolaczkowski
D Hillis
D Penny
DJ Taylor
DL Swofford
DM Hillis
DM Hillis
DM Hillis
E Mossel
E Susko
F Delsuc
F Ronquist
FE Anderson
H Akaike
H Brinkmann
J Bergsten
J Felsenstein
J Felsenstein
Joseph W. Thornton
JP Huelsenbeck
JP Huelsenbeck
JP Huelsenbeck
JP Huelsenbeck
JP Huelsenbeck
JS Rogers
JS Rogers
JT Chang
K Misawa
KG Karol
M Alfaro
M Anisimova
M Holder
M Pagel
M Spencer
ME Alfaro
MK Kuhner
MP Cummings
MP Simmons
N Saitou
P Erixon
P Lewis
P Lopez
PO Lewis
RC Jeffrey
S Guindon
Wayne Delport
WJ Bruno
WJ Murphy
Y Inagaki
Y Suzuki
Z Yang
Z Yang
Z Yang
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Bayesian inference (BI) of phylogenetic relationships uses the same probabilistic models of evolution as its precursor maximum likelihood (ML), so BI has generally been assumed to share ML's desirable statistical properties, such as largely unbiased inference of topology given an accurate model and increasingly reliable inferences as the amount of data increases. Here we show that BI, unlike ML, is biased in favor of topologies that group long branches together, even when the true model and prior distributions of evolutionary parameters over a group of phylogenies are known. Using experimental simulation studies and numerical and mathematical analyses, we show that this bias becomes more severe as more data are analyzed, causing BI to infer an incorrect tree as the maximum a posteriori phylogeny with asymptotically high support as sequence length approaches infinity. BI's long branch attraction bias is relatively weak when the true model is simple but becomes pronounced when sequence sites evolve heterogeneously, even when this complexity is incorporated in the model. This bias—which is apparent under both controlled simulation conditions and in analyses of empirical sequence data—also makes BI less efficient and less robust to the use of an incorrect evolutionary model than ML. Surprisingly, BI's bias is caused by one of the method's stated advantages—that it incorporates uncertainty about branch lengths by integrating over a distribution of possible values instead of estimating them from the data, as ML does. Our findings suggest that trees inferred using BI should be interpreted with caution and that ML may be a more reliable framework for modern phylogenetic analysis

CiteSeerX

University of Memphis Digital Commons

9-Genes Reinforce the Phylogeny of Holometabola and Yield Alternate Views on the Phylogenetic Placement of Strepsiptera

Author: A Handlirsch
A Rokas
A Ronquist
AE Shipley
AG Böving
AL Wild
AP Rasnitsyn
BM Wiegmann
BM Wiegmann
Brian D. Farrell
D Carmean
D Grimaldi
D Posada
D Posada
DC Hayward
DD McKenna
DD McKenna
DD McKenna
DJ Zwickl
DL Swofford
Duane D. McKenna
EK Buschbeck
EK Buschbeck
EM Zdobnov
F Bonneton
F Bravo
F Friedrich
F Proffitt
G Talavera
H Philippe
H Pohl
H Pohl
H Pohl
H Pohl
HH Ross
J Castresana
J Felsenstein
J Kathirithamby
J Kathirithamby
J Kathirithamby
J Kukalova-Peck
J Kukalova-Peck
J Savard
JF Lawrence
JP Huelsenbeck
JP Huelsenbeck
JP Huelsenbeck
JP Huelsenbeck
K Katoh
KC Nixon
M Kuhner
MF Whiting
MF Whiting
MF Whiting
MF Whiting
MF Whiting
MF Whiting
N Chalwatzis
NP Kristensen
NP Kristensen
NP Kristensen
NP Kristensen
P Rossi
PA Latreille
RA Crowson
RA Crowson
RG Beutel
RG Beutel
RK Kinzelbach
RK Kinzelbach
SJ Longhorn
T Hunt
UW Hwang
V Krauss
W Hennig
W Kirby
Wayne Delport
WC Wheeler
WD Pierce
WD Pierce
Publication venue: Public Library of Science
Publication date: 01/07/2010
Field of study

Background: The extraordinary morphology, reproductive and developmental biology, and behavioral ecology of twisted wing parasites (order Strepsiptera) have puzzled biologists for centuries. Even today, the phylogenetic position of these enigmatic “insects from outer space” [1] remains uncertain and contentious. Recent authors have argued for the placement of Strepsiptera within or as a close relative of beetles (order Coleoptera), as sister group of flies (order Diptera), or even outside of Holometabola.Methodology/Principal Findings Here, we combine data from several recent studies with new data (for a total of 9 nuclear genes and ∼13 kb of aligned data for 34 taxa), to help clarify the phylogenetic placement of Strepsiptera. Our results unequivocally support the monophyly of Neuropteroidea ( = Neuropterida + Coleoptera) + Strepsiptera, but recover Strepsiptera either derived from within polyphagan beetles (order Coleoptera), or in a position sister to Neuropterida. All other supra-ordinal- and ordinal-level relationships recovered with strong nodal support were consistent with most other recent studies. Conclusions/Significance: These results, coupled with the recent proposed placement of Strepsiptera sister to Coleoptera, suggest that while the phylogenetic neighborhood of Strepsiptera has been identified, unequivocal placement to a specific branch within Neuropteroidea will require additional study.Organismic and Evolutionary Biolog

Harvard University - DASH